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^ ! 18 Summary 

(-H .19 

^3 2q The /3-model provides a convenient tool for analyzing network data and Chatterjee, Diaconis 

2 2i anc l Sly (2011) recently established the consistency of the maximum likelihood estimate (MLE) 

i G ft n in the /3-model when the number of vertices goes to infinity. In this note, by effectively approx- 

22 imating the inverse of the Fisher information matrix, we further obtain its asymptotic normality 

24 under mild conditions. Simulation studies and a data example are also provided to illustrate the 
theoretical results. 

O 26 
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CN 30 1_ introduction 



For an undirected random graph on t vertices, the /3-model (Chatterjee, Diaconis and Sly, 
2011) assumes that for each 1 < i ^ j < t, there exists an edge between i and j with probability 



Q 34 e ft+/3, 

.-^35 Pm = 1 + e ft+^ ' 
36 

37 independently of all other edges where fa is the influence parameter of the vertex i. This model 

38 of random graphs is closely related to the Bradley-Terry model for rankings (Bradley and Terry, 

39 1952) and is actively used for analyzing network data (Newman et al., 2001; Jackson, 2008; 

40 Robins et al., 2007). For many real world networks, the number of vertices t is large and hence 

41 it is necessary to consider the asymptotics with a diverging number of vertices. In the Bradley- 

42 Terry model for paired comparisons, Simons and Yao (1999) proved that the MLE is consistent 

43 and asymptotically normal when the number of parameters goes to infinity. This contrasts with 

44 the well-known Neyman-Scott problem under which the MLE fails even to attain consistency 

45 when the number of parameters goes to infinity. More recently, Chatterjee, Diaconis and Sly 

46 (2011) proved that the MLE of the /3-model is consistent when the number of vertices t goes to 

47 infinity. In this note, by effectively approximating the inverse of the Fisher information matrix, 

48 we further establish its asymptotic normality under mild conditions. 
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The rest of the paper is organized as follows. In Section 2, we present a central limit theorem 
in the /3-model. Simulation studies and a real example are given in Section 3. All the proofs are 
relegated to the Appendix. 



2. Main results 

Suppose that Q is an undirected graph on t vertices generated from the /3-model where 
(3 = (Pi, ■ ■ ■ , Pt) T £ 1Z l is unknown. Let di, dt be the degrees of the vertices of Q, and the 
likelihood is then 

The maximum likelihood estimate P of P can be obtained by solving the equations 

d * = E jnjr> * = !.■■• CD 

J£ 1 + eft+ft 

For the generalized /3-model, Rinaldo, Petrovic and Fienberg (2011) obtained the necessary 
and sufficient conditions for the existence of the maximum likelihood estimate (MLE) /3 = 
(Pi,-- - , Pt)- When t — > co, Chatterjee, Diaconis and Sly (2011) established the following the- 
orem: 

THEOREM 1. Define L t = max \Pi\. 

l<i<t 

(1) If L t = o(logi), then uniquely exists with probability tending to one. 

(2) IfL t = o(log(logt)), then 



max \pi - Pi\ < O p (e cieC2Lt+c " Lt \r^) = o p (l), 



l<i<n" 1 " V t 

where ci , c<i and C3 are positive constants. Hence (3 is uniformly consistent. 
Denote the covariance matrix of d = (di, ■ ■ ■ ,dt) by V t = (vij)txt where 



Vi 



i,j = ,t;i^j and v iyi = ^Vij, i = !,-■■ ,t. 



« (1 + ePi+W 

Note that Vt is also the Fisher information matrix for p. To establish the asymptotic normality of 
/3, we first obtain an accurate approximation of Vf 1 . Let St = (sij)txu where 

5; n 1 

Si j = - — (2) 
Vi,i v.. 

5ij is the Kronecker delta function and v.. = j=i-#j v i,j- ^ n tne following proposition whose 
proof is given Appendix 1, we obtain an upper bound on the error of using St to approximate 

Vf 1 . 



Proposition 1 

nvr 1 - = n< 

(t-1) 



f>U 



Vf 1 - S t \\ =0( Tr ^), (3) 



where \\A\\ = maxjj \aij\for a matrix A = (ajj). 
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97 Since 

99 TT, — 2Z7\2 - Ui J = 777 — B-+B \2 - 7> i ^ ^ 

100 (l + e 2Lt Y (l + eP' + Pi) 2 4 

101 we have 

1° 2 , x w 

(t - l)e 2Lt ^ ^ t - 1 

104 (l + e 2it ) 



103 < „ < 6 ~ 1 i-12... t 



If e Lt = o( v / i^ r T) 



105 
106 

107 ^max^i (1 + e2 L t)2 

108 < , ■ 2L, = 1 • 

109 u - 4 ^ e 4 

110 Noticing that di = is a sum of i — 1 independent binomial random variables, it is easy 

111 to get the following proposition by the central limit theorem for the bounded case (page 289, 

112 Loeve's, 1977). 
113 

PROPOSITION 2. Ife Lt = o(y/t — 1), then for any fixed r > 1, ast — >• oo, the first r elements 
o/ <Sf(d — -E(d)) are jointly asymptotically normal with mean zero and covariance structure 

n6 given by (G t ) r xr, where G t = diag(vi )1 ,v 2 ,2, • • • , Ut,t). 

' ' ; We now estabhsh a central limit theorem for the MLE in the /3 model, whose proof is given in 

1 1 8 Appendix 2. 
119 

120 THEOREM 2. IfLt = o(log(logt)), then for any fixed r > 1, ast — > oo, the first r elements of 

121 — (3 are jointly asymptotically normal with mean and covariance structure given by (Gt) r xr- 
122 

123 

124 3. Numerical examples 

125 We first conduct simulation studies to illustrate our theoretical results. By Theorem 2, we con- 

126 struct approximate 95% confidence intervals for and /3, — j3j. We report the coverage prob- 

127 abilities for certain — f3j and the average coverage probabilities (ACP) for i = 1, ...,t as 

128 well as the probabilities that the MLE does not exist. Let /3j = 2iL t /t — L t ,i = 1, ■ ■ ■ ,t and L t 

129 is chosen to be 0, log(logi), (logt) 1//2 , and logt respectively. From Table[T] we see that when 

130 L t = or log(log(i)), the coverage probabilities are very close to the nominal level, indicating 

131 the adequacy of the constructed confidence intervals. When L t = (logt) 1 / 2 or logt, the MLE 

132 does not exist with nonzero probabilities and the coverage probabilities deviate much from the 

133 nominal level. This demonstrates that the condition on L t in Theorem 2 is critical in ensuring the 

134 existence of the MLE and its asymptotic normality. 
135 

136 [Table □ about here] 
137 

138 Next, we analyze the food web dataset in Blitzstein and Diaconis (2009), which contains 33 

139 organisms in the Chesapeake Bay and each organism is represented by a vertex in the graph. 

140 As in Blitzstein and Diaconis (2009), we study the simple graph after omitting the self-loop at 

141 vertex 19. 
142 

143 [Table |2] about here] 
144 
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145 The degree sequence d of the graph is summarized as below 

146 

14 " d = (7, 8, 5, 1, 1, 2, 8, 10, 4, 2, 4, 5, 3, 6, 7, 3, 2, 7, 6, 1, 2, 9, 6, 1, 3, 4, 6, 3, 3, 3, 2, 4, 4). 

148 The influence parameters along with their standard errors are reported in Table 2. The largest 

149 four degrees are 8, 8, 10, 9 for vertices 2, 7, 8, 22, which also have the largest four influence 

150 parameters -0.083,-0.083,0.275,0.102 from Table |2] On the other hand, the four vertices 

151 with the smallest influence parameter —2.602 all have degree 1. This indicates that the larger 

152 influence parameter the vertex has, the more it is linked with the other vertices as described by 

153 the /3-model. 
154 

155 

156 Appendix 1 

157 Proof of Proposition^ Define m = min Vi.j and M = max Ujj. By (|4]l, we have m > 

15g l<i<j<n ' l<i<j<n ' 

e 2Lt /(I + e 2Lt ) 2 and M < 1/4. Denote the t x t identity matrix by I t . Write F t = V^ 1 - S t , R t - 

160 

161 F t = (Vf 1 - S t )(It - V t St) + S t (It - V t St) = F t Rt + W u 

162 
163 

V 



171 



[fij) = It — VtSt and Wt — (wi.j) — StRt- We have the recursion 
F t = (1 

and it follows that, for any i, 

fi,j = ^fi,k[( S k,j - 1)— + ^^-)] + Wij, j = !,-■■ ,n. 



k=l 



U J,J 



165 
166 

167 Fixing i, let f ia = max f ik and f i/3 = min f ik . Since 2 Yl=i fi,ki>k,k = h we have fop < 1/ (2u.) 

l<k<t l<k<t 

168 and fo a > 0. By calculation, it can be shown that 

169 M 

170 max{\wij\,\wij—Wi t k\)<—2(i — tt^ foralH,j, fc, (Al) 



m 2 (t- l) 2 

172 and 

173 t 

174 fi, a ~ hp = Yikk ~ fi,p)W- ~ 6 k,p)— - (1 - &,<*)—] + t«i,a " (A2) 

175 u ^ 

176 Define a = m ^l 1);j , = {fc : (1 - S k ,p)v k ,0 /vp.p > (1 - 4, Q W, Q /va, a } and |0| = A. It follows that 
177 

178 V(/ i))b - / i)/3 )[(l - <J M ) — _ (i _ ^ Q )^fi] 

179 kTn v ^ 

Tg^ S Ui,a - Ji,/3,)| — 



f/3,/3 Wa,a 

1 82 < (ha ~ fi,fl)fW, (A3) 

183 □ 
184 

185 WherC = XM+(t-i-x)m - (A-i)m~+(«-A)M - Note that /( A ) takes its maximum at A = f/2 when 

186 A e [1, t — 1] and /(t/2) - Z+^Z - B Y <EB' <E3. and 63. 

187 , , tM - (* - 2)m 

188 /<,„ - / W < tM+(f _ 2)m >< (/«.« - h,) + «• 

190 



Hence, 



191 M(tM + (f - 2)m) 

192 fi,a-h,p < 2 (t-2)m 3 (i-l) 2 ' 
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193 and 

194 M(tM + (t - 2)m) 1 M(tM + (t — 2)m) 1 

195 ha < 2 (i-2)m 3 (<- l) 2 + 2^T " 2(t - 2)m 3 (t - l) 2 + 2m(t - l) 2 

196 e 6L t 

197 =°^)- 
198 
199 
200 
201 

202 We first prove two lemmas. 

203 LEMMA 1 . Let F t = Vf 1 - S t and U t = Cov{F t (d - E(d))}. Then 
204 

^ wsinr'-Aii + jgS^. <A4) 

207 p roo / Note that 

208 
209 



Appendix 2 



210 and 
211 

212 (St(It - V t St)) id 



U t = FtV t F? = (Vf 1 - S t ) - S t (It - V t St), 
{Si j - l)vij 1 



213 

214 BydAB, 

215 fl + e 2L ) 4 fl + p 2i ) 2 (l + e 2L ) 4 

216 kw - H5*)) U i < --{jy^, i^fc} < i y^ r j F . 



Thus, 



217 
218 

219 \\u t \\<\\v t - 1 -s t \\ + \\s t (i t -v t s t )\\ 

220 , (l + e 2Lt ) 4 

221 < n^-^ii + ; e4 4_j )2 - 

222 

223 LEMMA 2. Assume f/zaf Theorem 1 (2) holds. Then, for any i, 

224 i 

225 /3 l -A = (^r 1 (d-S(d))) J +o P (^). (A5) 
226 

Prao/ By Theorem 1 (2), we know that 

228 ^+c 3 L. /log^s 



229 ^* = I ^ _ ft I = O p (e c . 

230 . . 

23j Let jij — f3i + f3j — — /3j. By Taylor expansion, for any i ^ j, 

232 
233 
234 
235 



1 + e 
where 



e ft+ft' e ft+ft e /3i+ft 



236 _ e ft+&+ei. 3 -7M(i - e ft + ft +0 «-^«-Q , 2 

237 l ' ] ~ 2{l + e^+^+ e --^f 7i ' J '' 

238 and < 0j , < 1. Rewrite (3) as 

239 " J ~ 

240 d-£(d) = F t (/3-/3) + h, 
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where h = (hi,-- - ,h t ) T and hi = J2j^i Equivalently, 

$-f3 = Vf 1 (d-E(d))+Vf 1 h. (A6) 
Since \e x (l - e x )/(l + e x f \ < 1, we have 

|^|<|7- J I/2<2A?, 

and 

\h i \<Y,\h i , j \<2(t-l)\ 2 t . 

Note that 

h 1 * 

(S t h)i = -L ^h jt and (Vf l h)i = (5 4 h), + {F t h)i. 

3=1 

By calculation, we have 

KShfcl < 8A?(1 + £ f Lt)2 = 0(e^*+<*»+»>* x 
and, by Proposition 1, 

|(F,h)i| < \\F t \\ x (imaxl^l) < 0(e 6Lt x A?) < O^de^+^+e)!,, x ^ 

If L t = o(log(logi)), then |(V t " lh )i| < l(<Si h M + l(-Pi h ) 4 | = o(l/t^ 2 ). This completes the proof. □ 
Proof of Theorem^ By dA6l 

(/3 - fa = (S t (d - E(d))) i + (F t d)i + (Vf 1 ^. 
By Lemmas 1 and 2, if L t = o(log(logi)), then 

0-(3) i = (S t (d-E(d))) i + o(l/t 1 / 2 ). 
Theorem|2]follows directly from Proposition [2] □ 
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Table 1. Coverage probabilities and probabilities that the MLE does not exist (in parentheses). 



(t) 


(i,j)/ACP 


L t = 


L t = log(logt) 


L t = (log*) 1 / 2 


L t = logt 


(50) 


(1,50) 


0.953(0) 


0.953(0) 


0.98(0.081) 


0(1) 




(25,26) 


0.951(0) 


0.952(0) 


0.955(0.081) 


0(1) 




(49,50) 


0.951(0) 


0.956(0) 


0.992(0.081) 


0(1) 




ACP 


0.950(0) 


0.953(0) 


0.96(0.081) 


0(1) 


(100) 


(1,100) 


0.954(0) 


0.949(0) 


0.978(0.004) 


0(1) 




(50,51) 


0.952(0) 


0.943(0) 


0.96(0.004) 


0(1) 




(99,100) 


0.953(0) 


0.955(0) 


0.981(0.004) 


0(1) 




ACP 


0.952(0) 


0.951(0) 


0.957(0.004) 


0(1) 


(200) 


(1,200) 


0.948(0) 


0.954(0) 


0.954(0) 


0(1) 




(100,101) 


0.954(0) 


0.948(0) 


0.945(0) 


0(1) 




(199,200) 


0.951(0) 


0.947(0) 


0.966(0) 


0(1) 




ACP 


0.951(0) 


0.951(0) 


0.953(0) 


0(1) 



Table 2. The food web dataset: the estimated influence parameters and their standard errors ( in 
parentheses). 



Vertex 





Vertex 





Vertex 





Vertex 





1 


-0.285( 2.233 ) 


2 


-0.083 ( 2.332 ) 


3 


-0.754 ( 1.981 ) 


4 


-2.602 ( 0.977 ) 


5 


-2.602 ( 0.977 ) 


6 


-1.853 ( 1.349 ) 


7 


-0.083 ( 2.332 ) 


8 


0.275 ( 2.486 ) 


9 


-1.041 ( 1.816) 


10 


-1.853 ( 1.349 ) 


11 


-1.041 ( 1.816 ) 


12 


-0.754 ( 1.981 ) 


13 


-1.389 ( 1.612) 


14 


-0.506 ( 2.118 ) 


15 


-0.285 ( 2.233 ) 


16 


-1.389 ( 1.612) 


17 


-1.853 ( 1.349 ) 


18 


-0.285 ( 2.233 ) 


19 


-0.506 ( 2.118 ) 


20 


-2.602 ( 0.977 ) 


21 


-1.853 ( 1.349 ) 


22 


0.102(2.415) 


23 


-0.506 ( 2.118 ) 


24 


-2.602 ( 0.977 ) 


25 


-1.389 ( 1.612) 


26 


-1.041 ( 1.816) 


27 


-0.506 ( 2.118 ) 


28 


-1.389 ( 1.612) 


29 


-1.389 ( 1.612) 


30 


-1.389 ( 1.612) 


31 


-1.853 ( 1.349 ) 


32 


-1.041 ( 1.816) 


33 


-1.041 ( 1.816) 















